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Abstract 


Severe acute respiratory syndrome (SARS) is an acute respiratory illness, which has broken out in China. It has been known that 
SARS coronavirus (SARS_CoV) is a novel human coronavirus and is responsible for SARS infection. Belonging to one of the major 
proteins associated with SARS_CoV, SARS 3C-like protease (SARS_3CL?"°) functions as a cysteine protease engaging in the 
proteolytic cleavage of the viral precursor polyprotein to a series of functional proteins required for coronavirus replication and is 
considered as an appealing target for designing anti-SARS agents. To facilitate the studies regarding the functions and structures of 
SARS_3CL?"°, in this report the synthetic genes encoding 3CL?*° of SARS_CoV were assembled, and the plasmid was constructed 
using pQE30 as vector and expressed in Escherichia coli M15 cells. The highly yielded (~15 mg/L) expressed protease was purified by 
use of NTA-Ni** affinity chromatography and FPLC system, and its sequence was determined by LC/MS with the residue coverage 


of 46.4%. 
© 2003 Elsevier Inc. All rights reserved. 
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From the end of the year 2002 to the June of the 
year 2003, one severe epidemic disease called severe 
acute respiratory syndrome (SARS) broke out severely 
in China, and SARS infection has also spread to more 
than 30 countries. By using biophysical and biochem- 
ical techniques such as electron microscopy, virus-dis- 
covery microarrays containing conserved nucleotide 
sequences characteristic of many virus families, ran- 
domly primed RT-PCR, and serological tests, it has 
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been determined that SARS coronavirus (SARS_CoV) 
is responsible for SARS infection [1-3]. Coronavirus is 
a positive-stranded RNA virus with halo or corona 
appearance if viewed under a microscope and involves 
the largest viral RNA genomes known to date. The 
studies have suggested that SARS_CoV is a previously 
unknown coronavirus, which belongs neither to a 
mutant of any known coronavirus nor a recombinant 
of known coronaviruses; it is believed to be a novel 
human coronovirus possibly originated from a non- 
human host [4,5]. 

Proteolytic processing of viral polyproteins is a key 
step in the replication cycle of many positive-strand 
RNA viruses and such processing is performed by the 
encoded proteases [6,7]. It has been known that the 
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replicase gene for encoding the proteins required for 
coronavirus replication and transcription encompasses 
more than 20,000 nucleotides [8,9] and encodes two 
overlapping polyproteins, ppla (replicase la, around 
450kDa) and pplab (replicase lab, around 750kDa), 
which feature the sequence motifs of both papain-like 
cysteine protease and the 3C-like protease (3CL?"°) 
[10,11]. Recently, the genome sequencings deposited in 
the GenBank (http://www.ncbi.nlm.nih.gov/) for the 
SARS_CoV from different SARS patients have laid a 
potent foundation for the research of SARS patho- 
genesis and anti-SARS drug design [12-14]. The fact 
has been demonstrated that the important proteins 
associated with the SARS _CoV infection involve the 
RNA polymerase, the spike (S) glycoprotein, the en- 
velope (E) protein, the membrane (M) protein, the 
nucleocapsid (N) protein, and the main protease, 3C- 
like (3CL?°) protease [15,16]. As the viral main pro- 
tease, 3CL?’° functions as a protease to control the 
activities of the coronavirus replication complex 
[17,18]. 

It has been concluded from the previous research 
data that the 3CL?'°-mediated processing pathways 
are conserved in coronaviruses. Coronavirus main 
proteases employ conserved cysteine and _ histidine 
residues in the catalytic site and lack acidic active site 
residue [6,19-21]. The results have also confirmed that 
for coronavirus main proteases their substrate speci- 
ficities are also well defined, with the known cleavage 
sites involving bulky hydrophobic residues (mainly 
leucine) at the P2 position, glutamine at the P1 posi- 
tion, and small aliphatic residues at the Pl’ position 
[17,18]. In addition, the recent determination of the 
crystal structures for human coronavirus (strain 229E) 
3CL’° and for an inhibitor complex of porcine co- 
ronavirus (transmissible gastroenteritis virus, TGEV) 
3CL?° also confirms a remarkable degree of conser- 
vation of the substrate binding sites for coronavirus 
3CLP° [16]. In fact, the studies have already shown 
that 3CLP° is a useful target for screening anti-virus 
agents [18,22,23]. Like other 3CL?", it is hopeful that 
SARS_3CL?° will surely become an appealing target 
in discovering new agents for the treatment of SARS 
[16]. 

Therefore, based on the aforementioned facts, it 
seems to be very important to express and purify large 
amount of the SARS 3CL?'° for its structural and 
functional research purposes. In our previous work 
[24,25], we reported a 3D model of SARS_3CL?"° and 
its inhibitor design by virtual screening, as well as the 
cloning, expression, and purification of the E protein of 
SARS_CoV. In this article, we would like to present the 
results describing the molecular cloning, expression and 
purification of 3CL°° of SARS _CoV, and the pre- 
liminary study on its mass spectrometric characteriza- 
tion is also reported. 


Materials and methods 
Chemicals, enzymes, and the vector pOE30 


The restriction and modifying enzymes in this work 
were purchased from TaKaRa and the vector pQE30, 
the bacterial strains M15 and DH5a were from Qiagen. 
Trizol and Superscript II reverse transcriptase were 
purchased from Gibco. Trypsin (sequencing grade) was 
purchased from Sigma. The chelating affinity column 
and lower molecular weight (LMW) marker were pur- 
chased from Amersham—Pharmacia Biotech. All other 
chemicals were from Sigma in analytical grade. 


Bacterial strains and culture media 


Escherichia coli DHSa was utilized for propagation of 
plasmids. DH5a was maintained on LB agar plates and 
grown at 37°C, while M15 was cultured on LB agar 
plates containing kanamycin (25 mg/L). For agar plates, 
Bacto agar was added to the media to a final concen- 
tration of 1.5% (w/v). Ampicillin was added to the media 
at a final concentration of 100 mg/L for the selection of 
transformants. E. coli M15 was chosen as the host for 
gene expression. The strains were maintained in LB 
medium including 15% glycerol at —80°C. Ampicillin 
and kanamycin as antibiotics were added to the media at 
a final concentration of 100 and 25 mg/L, respectively. 


Cloning of SARS_ 3CL?" gene in pOE30 


All cloning techniques including PCR, restriction 
digestion, ligation, E. coli transformation, and plasmid 
DNA preparation were according to the literature 
method [26]. 

SARS_CoV (isolate BJ0O1) RNA was extracted with 
Trizol reagent according to manufacturer’s instruction 
(www.genehub.net/trizol.htm). The reverse transcription 
was performed with the random priming method by the 
Superscript II reverse transcriptase. The SARS_3CLP'° 
cDNA was subsequently amplified by PCR, using 
the following primers: 3CLf (5'/-GGGGGATCCACCA 
TGAGTGGTTTTAGGAAAATGGCA-3’) and 3CLr 
(5'-GGGAAGCTTTTGGAAGGTAACACCAGAGC 
A-3’). After digestion with BamHI and HindIII, the PCR 
product was inserted into the BamHI and HindIII sites of 
the vector pQE30 (Qiagen). The residues in the expression 
tag are “MRGSHHHHHHGSTM”. The SARS_3CL?"° 
insert was verified by sequencing. 


Expression and purification of SARS_3CL?" 


Escherichia coli M15 cells transformed with the 
plasmid pQE30-SARS_3CL?° were grown in 100ml LB 
medium containing ampicillin (100mg/L) and kana- 
mycin (25 mg/L) at 37°C overnight and then inoculated 
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into | L LB supplemented with both the antibiotics. The 
expression of SARS_3CLP'° was induced by the addition 
of 0.5mM of isopropyl B-p-thiogalactoside (IPTG). 
After induction for 5h at 18°C, the cells were harvested 
by centrifugation at 4000g, 4°C for 30min. The pellet 
was washed, frozen, and then disrupted by sonication 
against Buffer A (20mM Tris-HCl, 0.5M NaCl, and 
5mM imidazole, pH 8.0). The lysed cells were centri- 
fuged at 14,000g at 4°C for 1h. Keep the supernatant 
and discard the pellet. A 1-ml HiTrap Ni?* chelating 
column was equilibrated with 10 ml of sterile deionized 
water, 50mM NiSOg, and finally 10ml Buffer A. The 
supernatant was passed over the column at a flow rate of 
5 ml/min, followed by washing it with 20 ml Buffer A and 
20 ml Buffer B (80mM imidazole in Buffer A), respec- 
tively. The protease of interest was eluted with 10ml 
Buffer C (20mM Tris-HCl, 0.5M NaCl, and 0.5 M im- 
idazole, pH 8.0) and then purified further by gel filtration 
using a HiTrap 16/60 Sephacryl S100 column pre-equil- 
ibrated with Buffer D (5mM dithiothreitol, 150mM 
NaCl, and 10mM Tris-HCl, pH 7.5) through an FPLC 
system (Pharmacia). The highly purified His-tagged 
SARS_3CL?"° with the yield of 15 mg/L was obtained. 


In-gel digest and peptide extraction 


The protocol used for the in-gel digest in this study 
was modified according to the literature method de- 
scribed by Yu et al. [27]. The gel band of interest 
(SARS_3CL?"°) was exercised from the Coomassie- 
stained SDS-PAGE gel with a steel scalpel and destained 
in an Eppendorf tube by washing sequentially with 100 pl 
of 30% CH3CN/100mM ammonium bicarbonate. The 
washing step was repeated until the gel bands were clear. 
And then the gel band was completely dried by a Speed- 
Vac Vacuum centrifuge apparatus (Savant, Holbrook, 
NY) and cut into small pieces. The dried pieces were re- 
swollen by adding about 30ul of SOmM ammonium 
bicarbonate (pH 8.3). The volume added was to the 
minimum necessary to completely cover the gel pieces 
and then trypsin was added to the ratio of enzyme to 
sample in 1:20 (w/w). The gel pieces were incubated at 
37°C for 12-16h.The tryptic peptides were extracted by 
adding 30 ul solution containing 60% CH3CN/0.1% TFA 
and vortexing for 4min before removing the solution. 
This extraction step was performed three times with the 
same solution. The extraction solution was pooled 
together in a 0.5 ml Eppendorf tube and evaporated to 
10-20 pl by Speed-Vac Vacuum centrifuge apparatus. 


LC-ion trap-MS and MS/MS 


The LC/MS system used for analyzing tryptic pep- 
tides was a combination of HP1100 (Agilent, Cheshire, 
UK) LC system with LCQ-DECA Mass Spectrometry 
(Thermofinnigan, San Jose, CA). A microbore reverse 


phase column (C8 50x1.0mm ID, 74m, ABI RP300) 
was used for LC separation. Solvent A was 0.1% FA in 
100% (v/v) water and solvent B was 0.1% FA in 100% (v/ 
v) CH3CN. The gradient started at 5% B, held for 2 min, 
and went linearly to 80% B in 50 min. The peptide mix- 
ture was injected into the column by an autosampler and 
separated at a flow rate of 200 l/min. The fractions were 
detected by PDA (TSP UV6000) and directly introduced 
on-line into ESI source. The operating condition was 
optimized with standard solution provided by manu- 
factures and the working parameters of ion source were 
as follows: capillary temperature, 200°C; spray voltage, 
5kV; capillary voltage, 15 V; and sheath gas flow rate, 
20 arb. To get more mass spectra within an LC peak, two 
types of scan modes, full scan and MS/MS (with data- 
dependent), were used for acquiring more data points. 
The scan mass range was from m/z 400 to m/z 2000 and 
the collision energy was set at 38%. 


Results and discussion 


Construction of — the vector 


SARS_3CLP° 


expression pQE30- 


The SARS _3CL?'° PCR product verified by se- 
quencing was digested with BamHI and HindIII, and 
then inserted into the BamHI and HindIII sites of the 
vector pQE30. E. coli M15 cells transformed with the 
plasmid pQE30-SARS_3CLP were used for the ex- 
pression of His-tagged SARS_3CL?°. 


Expression and purification of SARS_3CL?" 


Based on the optimization of the expression and 
purification method of SARS_3CL?"° from E. coli, the 
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Fig. 1. SDS-PAGE analysis of the purification of SARS_3CL?° (1, 
Marker; 2, supernatant; 3, pellet; 4, purified by NTA-Ni** affinity 
column; and 5, after purified by FPLC system). 
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Table 1 


Purification Scheme of SARS_3CL?"® expressed in E. coli transformed with pQE30-SARS_3CL?" (1 L culture) 


Step Total protein (mg) SARS_3CL?° (mg) Purification factor 3CL?° yield (%) 
Extraction 500 50 1 100 
Ni?+-affinity column 50 40 8 80 
Gel filtration 15 15 10 30 


homogeneous protein was successfully isolated by two 
chromatographic steps. The SDS-PAGE analysis of the 
purification of SARS_3CL?° is shown in Fig. | and the 
purification scheme of SARS_3CL?° expressed in E. 
coli transformed with pQE30-SARS_3CL?° in 1 L cul- 
ture is listed in Table 1. 

From these results it can be seen that the use of 
PQE30-SARS_3CL?"° plasmid and expression in E. coli 
M15 cell can produce a large amount of soluble 
SARS_3CL?°. The purification procedure is also easy 
to be handled. 


LCIMS and LC/MS/MS analysis of SARS_3CL?"° and 
tryptic peptides 


The result of data search using MS/MS raw data of 
tryptic peptides from gel band shows that the 3CL?"° is 
the first candidate with a summary score of 456.5, which 
is much higher than that of the second candidate (score 
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46.5), and it also shows that eleven peptides (T1-T11) 

were matched with tryptic peptides of 3CLP’° (Fig. 2). 
The MS/MS spectrum of doubly charged precursor 

ion of T3 peptide at m/z 566.2 was displayed as an 


MRGSHHHHHHGSTMSGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDT 
VYCPRHVICTAEDMLNPNYEDLLIRKSNHSFLVQAGNVQLRVIGHSMQNC 
LLRLKVDTSNPKTPK YKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNHTIK 
GSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGKFYGPFVDR 


QTAQAAGTDTTITLNVLAWLYAAVINGDRWFELNRFTTTLNDFNLVAMKY 


NYEPLTQDHVDILGPLSAQTGIAVLDMCAALKELLQNGMNGRTILGSTILE 
DEFTPFDVVRQCSGVTFQKLN 


Scheme 1. Sequence of SARS_3CL?'® showing the coverage of the 
protein obtained by mass spectrometry of in-gel tryptic digest. (Frag- 
ments of the sequence resolved by LC/MS/MS are shown in boldface 
and underline, and the expression tag is highlighted in a pane.) 
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Fig. 2. Base peak chromatogram of the peptides from LC/MS analysis of tryptic digests of SARS_3CL?°. 
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Fig. 3. MS/MS spectrum of the doubly charged precursor ion with m/z 566.2(T3) at a retention time of 10.25 min. A sequence is confirmed from the 
labeled b- and y-ions in the spectrum. Ions observed in the spectrum are underlined and assigned. 
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Fig. 4. Multiple charge ions of SARS_3CL?"° protease. 
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Fig. 5. Molecular weight of SARS_3CL?"° protease. 


example in Fig. 3. Most of the b and y ions were de- 
tected (>80%). The total number of amino acids con- 
tained within the eleven tryptic peptides was 145 
(Scheme 1), with the protein coverage at 46.4%. 

The molecular weight of 3CL?"° was also determined 
by the LC-MS system. The LC condition was based on 
that above described for peptide separation. An LC 
peak at a retention time of 18 min was observed (data 
not shown). Mass spectrum corresponding to this LC 
peak gave a multiple charge ion (Fig. 4). The molecular 
weight was obtained by deconvolution algorithm in 
“sequest’”” program. The measured mass of 3CLP° 
protease is 35831 (Fig. 5) and the difference between 
measured and theoretical mass (mw.35832) was only 1 
dalton. These results completely determine the identity 
of the expressed SARS_3CL?’° in this work. 

In conclusion, in this work we have succeeded in the 
molecular cloning of pQE30-SARS_3CL?, and with 
this plasmid using E. coli as expression system a large 
amount of purified His-tagged SARS_3CL?° protease 
has been obtained by NTA-Ni** affinity chromatogra- 
phy. The achieved protease may be surely used for 
screening its crystallized conditions for X-ray crystallo- 
graphic analysis. 
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